Kinetic Theory of Random Graphs: from Paths to Cycles 
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Structural properties of evolving random graphs are investigated. Treating linking as a dynamic 
aggregation process, rate equations for the distribution of node to node distances (paths) and of 
cycles are formulated and solved analytically. At the gelation point, the typical length of paths and 
cycles, Z, scales with the component size fc as Z ~ k^^"^. Dynamic and finite-size scaling laws for 
the behavior at and near the gelation point are obtained. Finite-size scaling laws are verified using 
numerical simulations. 
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I. INTRODUCTION 

A random graph is a set of nodes that are randomly 
joined by links. When there are sufficiently many links, 
a connected component containing a finite fraction of all 
nodes, the so-called giant component, emerges. Random 
graphs, with varying flavors, arise naturally in statisti- 
cal physics, chemical physics, combinatorics, probability 
theory, and computer science P, S Si 0i 13 ■ 

Several physical processes and algorithmic problems 
are essentially equivalent to random graphs. In gela- 
tion, monomers form polymers via chemical bonds un- 
til a giant polymer network, a "gel" , emerges. Identify- 
ing monomers with nodes and chemical bonds with links 
shows that gelation is equivalent to the emergence a gi- 
ant component [^Q,I3- A random graph is also the most 
natural mean- field model of percolation ^3 ■ In com- 
puter science, satisfiability, in its simplest form, maps 
onto a random graph [Tlj . Additional ly, r andom graphs 
are used to model social networks |0, ■ 

Random graphs have been analyzed largely using com- 
binatorial and probabilistic methods H,SilS' An alterna- 
tive statistical physics methodology is kinetic theory, or 
equivalently, the rate equation approach. The formation 
of connected components from disconnected nodes can be 
treated as a dynamic aggregation process [T3. ITslIT^ ll^ . 
This kinetic approach was used to derive primarily the 
size distribution of components [18i. il9i >2Q] . 

Recently, we have shown that structural characteristics 
of random graphs can be analyzed using the rate equation 
approach |2lj |. In this study, we present a comprehen- 
sive treatment of paths and cycles in evolving random 
graphs. The rate equation approach is formulated by 
treating linking as a dynamic aggregation process. This 
approach allows an analytic calculation of the path length 
distribution. Since a cycle is formed when two connected 
nodes are linked, the path length distribution yields the 
cycle length distribution. More subtle statistical proper- 
ties of cycles in random graphs can be calculated as well. 



In particular, the probability that the system contains no 
cycles and the size distribution of the first, second, etc. 
cycles are obtained analytically. 

We focus on the behavior near and at the phase tran- 
sition point, namely, when the gel forms. We show that 
the path and the cycle length distribution approach self- 
similar distributions near the gelation transition. At the 
gelation point, these distributions develop algebraic tails. 

The exact results obtained for an infinite system allow 
us to deduce scaling laws for finite systems. Using heuris- 
tic and extreme statistics arguments, the size of the giant 
component at the gelation point is obtained. This size 
scale characterizes the size distribution of components 
and it leads to a number of scaling laws for the typical 
path size and cycle size. Extensive numerical simulations 
validate these scaling laws for finite systems. 

The rest of the paper is organized as follows. First, the 
evolving random graph process is introduced (Sec. II), 
and then the size distribution of all components is ana- 
lyzed in Sec. III. Statistical properties of paths are de- 
rived in Sec. IV and then used to obtain statistical prop- 
erties of all cycles (Sec. V) and of the first cycle (Sec. VI). 
We conclude in Sec. VII. Finally, in an appendix, some 
details of contour integration used in the body of the 
paper are presented. 



II. EVOLVING RANDOM GRAPHS 

A graph is a collection of nodes joined by links. In 
a random graph, links are placed randomly. Random 
graphs may be realized in a number of ways. The links 
may be generated instantaneously (static graph) or se- 
quentially (evolving graph); additionally a given pair of 
nodes may be connected by at most a single link (simple 
graph) or by multiple links (multi-graph). 

We consider the following version of the random graph 
model. Initially, there are N disconnected nodes. Then, 
a pair of nodes is selected at random and a link is placed 
between them (Fig. This linking process continues 
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FIG. 1: An evolving random graph. Links are indicated by 
solid lines and the newly added link by a dashed line. 



ad infinitum and it creates an evolving random graph. 
The process is realized dynamically. Links are generated 
with a constant rate in time, set equal to (2A'^)~^ without 
loss of generality. There are no restrictions associated 
with the identity of the two nodes. A pair of nodes may 
be selected multiple times, i.e., a multi-graph is created. 
Additionally, the two nodes need not be different, so self- 
connections are allowed. 

At time t, the total number of links is on average Nt/2, 
the average number of links per node (the degree) is i, 
and the average number of self-connections per node is 
N~^t/2. Therefore, whether or not self-connections are 
allowed is a secondary issue. Since the linking process is 
completely random, the degree distribution is Poissonian 
with a mean equal to t. 



III. COMPONENTS 

The evolving random graph model has several virtues 
that simplify the analysis. First, the linking process 
is completely random as there is no memory of previ- 
ous links. Second, having at hand a continuous variable 
(time) allows us to use continuum methods, particularly 
the rate equation approach. This is best demonstrated 
by determination of the size distribution of connected 
components. 

As linking proceeds, connected components form. 
When a link is placed between two distinct components, 
the two components join. For example, the latest link 
in Fig. n joins two components of size i — 2 and j = 4 
into a component of size k = i + j = 6. Generally, there 
are i x j ways to join disconnected components. Hence, 
components undergo the following aggregation process 



(*,j) > l + J- 



(1) 



Two components aggregate with a rate proportional to 
the product of their sizes. 

A. Infinite Random Graph 

Let Ck{t) be the density of components containing k 
nodes at time t. In terms of Nk(t), the total number of 
components with k nodes, then Ck(t) — Nk{t)/N. For 



finite random graphs, both Nk{t) and Ck{t) are random 
variables, but in the N —> oo limit the density Ck{t) be- 
comes a deterministic quantity. It evolves according to 
the nonlinear rate equation (the explicit time dependence 
is dropped for simplicity) 



dck 
~dt 



fcCfc. 



(2) 



i-\-j—k 



The initial condition is Cfc(O) = 5k,i- The gain term ac- 
counts for components generated by joining two smaller 
components whose sizes sum up to k. The second term 
on the right-hand side of Eq. |(5J) represents loss due to 
linking of components of size k to other components. The 
corresponding gain and loss rates follow from the aggre- 
gation rule (0). 

The rate equations can be solved using a number of 
techniques. Throughout this investigation, we use a con- 
venient method in which the time dependence is elimi- 
nated first. Solving the rate equations recursively yields 
ci — e~*, C2 = C3 — it^e"^*, etc. These explicit 

results suggest that Cfe(i) = C^t'^'^ e^^'* ■ Substituting 
this form into ^ , we find that the coefficients Ck satisfy 
the recursion relation 



{k 



(3) 



i+j=k 



subject to Ci — 1. This recursion is solved using the 
generating function approach. The form of the right- 
hand side of Eq. © suggests to utilize the generat- 
ing function of the sequence kC'k rather than C'k, i.e., 
Giz) = EfcfcC'fee'^^ Multiplying Eq. ® by fce''^ and 
summing over all k, we find that the generating function 
satisfies the nonlinear ordinary differential equation 



(1 _ G) ^ = G. 

dz 



(4) 



Integrating this equation, z — InG— G-|-^ and using 
the asymptotics G — > as z — > — oo fixes the constant 
A ^ 0. Thus, we arrive at an implicit solution for the 
generating function 



Ge-^ 



(5) 



The coefficients Ck can be extracted from Q via the 
Lagrange inversion formula, or using contour integration 
as detailed in Appendix A. Substituting r = 1 in Eq. (|A1|) 

yields C'k — ^-g— reproducing the well-known result for 
the size distribution 0, 0| 



Cfc(i) 



k 



fc-2 



k\ 



k-1 



-kt 



(6) 



In the following, we shall often use the generating func- 
tion for the size distribution c(z, t) = k Ck{t)e''^ . This 
generating function is readily expressed via the auxiliary 
generating function G{z) = J2k ^ ^k e'^^: 



c{z,t) = t"^G(z + lnt-t). 



(7) 



3 



Let us consider the fraction of nodes in finite compo- 
nents, Ml = ^i^kckit). This quantity is merely the 
first moment of the size distribution (hence the nota- 
tion). Equivalently Mi — c{z — 0,t). From {Tj) we find 
Ml = r/t with r — G{\nt — t). Using Q, we express r 
tlirough t: 



(8) 



For t < 1, there is a single root t = t, and all nodes reside 
in finite components, Mi = 1. For t > 1 the physical root 
satisfies t < t and only a fraction of the nodes resides 
in finite components, Mi < 1. Thus, at time t = 1, 
the system undergoes a gelation transition with a finite 
fraction of the nodes contained in infinite components. 
We term this time the gelation time, tg = 1. In the late 
stages of the evolution t ^ 1, one has r ~ te~* and 
Ml ~ ci = e~*, so the system consists of a single giant 
component and a small number of isolated nodes. 

The behavior at and near the transition point are of 
special interest. The critical behavior of the component 
size distribution is echoed by other quantities as will be 
shown below. Size distributions become algebraic near 
the critical point. Moreover, there is a self-similar be- 
havior as a function of time (dynamical scaling) and as 
a function of the system size (finite-size scaling). 

At the gelation point, the component size distribution 
has an algebraic large-size tail, obtained using the Stir- 
ling formula. 



(9) 



with C — (27r)^^/^. [Throughout this paper, bold letters 
are used for critical distributions, so Cfe = Cfe(t = 1).] In 
the vicinity of the gelation time, the size distribution is 
self-similar, Ck{t) (1 — t)^<i>c(fc(l — i)^) with the scaling 
function 



$,(0 = (27r)-i/2^-^/2exp(-^/2). 



(10) 



Thus, the characteristic component size diverges near the 
gelation point, fc ~ (1 — t)~'^. 



B. Finite Random Graphs 

In the previous subsection, we applied kinetic theory 
to an infinite system. This approach can be extended 
to finite systems. Unfortunately, such treatments are 
very cumbersome |22l . Since the number of compo- 
nents is finite, the fluctuations are no longer negligible, 
and instead of a deterministic rate equation approach, a 
stochastic approach is needed. Here we follow an alter- 
native path, employing the exact infinite system results 
in conjunction with scaling and extreme statistics argu- 
ments. 

The characteristic size of components at the gelation 
point exhibits nontrivial dependence on the system size. 



This is conveniently seen via the cumulative size distri- 
bution. The size of the largest component in the sys- 
tem, kg, is estimated from the extreme statistics crite- 
rion, iVEfe>fcCfc 1, to be 



(11) 



The largest component in the system grows sub-linearly 
with the system size The time by which this com- 
ponent emerges approaches unity for large enough sys- 
tems as follows from the diverging characteristic size scale 

fcg ~ (1 — tg) , 



7V-1/3, 



(12) 



The maximal component size ()ll|l underlies the entire 
size distribution. Let Ck{N,t) be the size distribution in 
a system of size N at time t. At the gelation point, the 
size distribution Ck{N) = Ck{N,t — 1) obeys the finite- 
size scaling form (Figs. |21 and |2Jl 



(13) 



The scaling function has the following extremal behaviors 



*c(0 



(27r)^i/2^-5/2 
exp(-r) 



e«i; 
c»i. 



(14) 



The small-^ behavior corresponds to sizes well below 
the characteristic size and thus reflects the infinite system 
behavior (jSJ. The large-^ behavior was obtained numer- 
ically with 7 = 3. To appreciate the large-^ asymptotic, 
let us estimate the probability that the system managed 
to generate the largest possible component of size A^/2 
at time t = 1. The lower bound for this probability can 
be established via a "greedy" evolution which assumes 
that after k linking events the graph is composed of a 
tree of size fc -I- 1 and N — k—1 disconnected nodes. Such 
evolution occurs with probability 

2 7V-2 3 iV-3 N - N/2 N/2 N\ 

N ' N ^ N ' N ^ •• • ^ N ^ ^ ]v^' 

that scales as e~^. While this lower bound is not nec- 
essarily optimal, it suggests that the actual probability 
is exponentially small. The scaling variable ^ = /ciV~^/^ 
becomes ^ ~ N^^^ for k = N/2, so exp(— A^^/'^) matches 
the probability exp(— A') when 7 = 3. 

To check the critical behavior in finite systems, we per- 
formed numerical simulations. In the simulations, A/2 
links are placed randomly and sequentially among the A^ 
nodes as follows. A node is drawn randomly, and then 
another node is drawn randomly. Last, these two nodes 
are linked. Self-connections are therefore allowed. The 
simulations differ slightly from the above random graph 
model in that the number of links is not a stochastic vari- 
able. For large A^, this simulation is faithful to the evolv- 
ing random graph model because the number of links is 
self-averaging. 
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Eq. (9) 
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IV. PATHS 

Structural characteristics of components can be inves- 
tigated in a similar fashion. By definition, every two 
nodes in a component are connected. In other words, 
there is a path consisting of adjacent links between two 
such nodes. We investigate statistical properties of paths 
in components. Characterization of paths yields useful 
information regarding the connectivity of components as 
well as internal structures such as cycles. 

For every node in the graph, there are (generally) mul- 
tiple paths that connect it with all other nodes in the 
respective component. With new links, new paths are 
formed. For every pair of paths of lengths n and m orig- 
inating at two separate nodes, a new path is formed as 
follows 



FIG. 2: The size distribution for a finite system at the gelation 
point. Shown is Ck{N) versus k for various A''. The infinite 
system behavior is shown for reference. The data represents 
an average over 10^ independent reahzations. 



n, m 



1. 



(15) 



1.5 



JJLP 

(M 



0.5 





^ N=10' 












\ \\\ 







2 



FIG. 3: Finite-size scaling of the size distribution. Shown 
is (27rf ^)^''^$c(C) versus obtained from simulations with 
various A'^. 



The simulation results are consistent with the pos- 
tulated finite-size scaling form H13|l . We note that the 
scaling function ^1/0(0 converges slowly as a function 
of N. The simulations reveal an interesting behavior 
of the finite-size scaling function. The function Ck{N) 
has a "shoulder" — a non-monotonic behavior compared 
with the pure algebraic behavior (jSJ characterizing infi- 
nite systems (Fig. |2l . The properly normalized scaling 
function (27r^^)^/^5'c(^) is a non-monotonic function of 

(Fig. El- Obtaining the full functional form of the scal- 
ing function ^'c(C) remains a challenge. A very similar 
shoulder has been observed for the degree distribution of 
finite random networks generated by preferential attach- 
ment imiiEiEi. 



In Fig. n linking two paths of respective lengths n ^ 1 
and m — 2 generates a path of length n + m + 1 = A. 
Thus, paths also undergo an aggregation process. How- 
ever, this aggregation process is simpler than because 
the aggregation rate is independent of the path length. 

Let qi{t) be the density of distinct paths containing I 
links at time t. By distinct we mean that the two paths 
connecting two nodes are counted separately. By defini- 
tion, qo{t) ~ 1. The rest of the densities grow according 
to the rate equation 



dqi 
dt 



n+m— / — 1 



qnQn 



(16) 



for I > 0. The initial condition is ^((O) = Si^. This rate 
equation reflects the uniform aggregation rate. Another 
notable feature is the lack of a loss term — once a path 
is created, it remains forever. Solving recursively gives 



qi = t, q2 
density is 



t , etc. By induction, the path length 



qi{t)^t' 



(17) 



Indeed, this expression satisfies both the rate equation 
and the initial condition. The first quantity qi = t is 
consistent with the facts that the link density is equal to 
t/2 and that every link corresponds to two distinct paths 
of length one. 

The above path density represents an aggregate over 
all nodes and all components. Characterization of path 
statistics in a component of a given size is achieved via 
Pi^k, the density of paths of length I in components of 
size k. Note the obvious length bounds < / < fc — 1 
and the sum rule ^iPi.k — k^Ck reflecting that there are 
fc^ distinct paths in a component of size k (every pair of 
nodes is connected) . The density of the linkless paths is 
Po,k — kck, because kck is the probability that a node 
belongs to a component of size k. 

We have seen that components and paths form via 
the aggregation processes Q and ifT^ . respectively. 
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The joint distribution pi^k therefore undergoes a bi- 
aggregation process (23|- In the present case, 



(n, i) + (to, j) > (n + m + 1, i + j) 



(18) 



where the first index corresponds to the path length and 
the second to the component size. The joint distribution 
evolves according to the rate equation 



dp, 



Lk 



dt 



Pn.iPm.j + ^ {ipi^i){jCj) - kpi^k- (19) 



i-\-j—k 
n+ra—l—1 



i+j=k 



The initial conditions are Pi,fc(0) = (5^ g- The first 
term on the right-hand side of Eq. 119|l describes newly 
formed paths due to linking. The last two terms cor- 
respond to paths that do not contain the newly placed 
link. 

We now repeat the steps used to determine the size 
distribution. The time dependence is eliminated using 
the ansatz pi^k = Pi.kt*'~^e~^^ . The corresponding coef- 
ficients P/jt satisfy the recursion 



(fc - l)Pi 



Lk 



E 



P P 



n+m— / — 1 



E 

i+j=k 



m,Mjc,). (20) 



The generating function Pi{z) = Pi k.e'^^ satisfies the 
recursion relation (1 



for I > 0. Dividing this equation by (gj yields 



n+m=l-l PnPm + Pi 



^dPl _ ^ 

n+m— t — 1 



Pi 



(21) 



for Z > 0. As noted above Pg.fc = ^C'fc, so Pq{z) = G{z). 
Solving Eq. (|^ recursively gives Pi — G'^ , P2 = , etc. 
In general, 



Pi{z)^G'+\z). 



(22) 



This solution can be validated directly. The time depen- 
dent generating function pi{z) = 'Yl,kPi,k^^^ therefore 
pi[z) = t~^G'+^(z+lni-t). The total density of paths of 
length I, pi{z = {)) = , coincides with (|17() prior to the 
gelation transition (t < 1) because all components are 
finite. However, the total number of paths is reduced, 
pi{z = 0) = t~^T'+^, past the gelation time {t > 1). 

One may also obtain the bivariate generating function 
p{z,w) — J2i kPi^k'w^^''^ . Using 1221) one gets 



p(z, w) = t 



G{z + lTLt-t) 
l-wG{z + \nt-t)' 



(23) 



The total density of paths in finite components is of 
course g = J2i kPi.k^ so g = p(z — — 1). Gener- 
ally, g — j(i^-fy; for t < 1 the total density of paths is 
g(i) = (l-r'. 

The coefficients are found via the contour integration 
Pi f. = {2Tri)~'^ § dy Piy~''~^ (see Appendix A). Substi- 
tuting r = Z -I- 1 in Eq. HA1|I yields P/,fc = (/ -f 1) j^^rjTrjjj- 



As a result, the density of paths of length I in components 
of size k is 



pi,k = (1 + 1) 



k-l-2 



(fc-/ - 1)! 



^k-l^-kt^ 



(24) 



Comparing 1)24(1 and © we notice that the densi- 
ties of the two shortest paths satisfy po,fe = kc^ and 
Pi^k — 2(fc — l)cfc. The latter reflects that there are k — 1 
links in a tree of size k and that with unit probability all 
components are trees (as discussed in the next section). 

Note also that the longest possible path, I = k—1, cor- 
responds to linear (chain-like) components. According to 
Eq. 123, the density of such paths is Pk~i,k = t^~^e~^^. 
This density decays exponentially with length, so these 
components are typically small, their length being of the 
order one. 

The path length density can be simplified in the large 
/c-limit by considering the properly normalized ratio of 
factorials 



k\ 



fc' {k-l)\ 




fc ^ 2-^ fc2 ^ 



exp(-ZV2fc) . 



Using the Stirling formula, in the limits fc 3> 1 and Z ^ 1, 
the path density becomes 



Pi 



fc~/(27rfc3)-i/2ifc-iefc(i-t) e-'V2fe. 



(25) 



As was the case for the component size distribution, the 
path length density is self-similar in the vicinity of the 
gelation point, pi j; {l-t)'^^p (fc(l - i)^, /(I - t)) , with 
the scaling function 



11) = 71 (2<3)-l/2 exp(_,y2/2?). 



(26) 



Thus, the characteristic path length diverges near the 
gelation point, / (1 — t)~^. 

At the critical point, the path length density becomes 



pi^k ^ l{2T:k 



3N-1/2 



exp(-/V2fc). 



(27) 



It is evident that the typical path length scales as square 
root of the component size 



(28) 



For finite systems, the scaling law for the typical path 
length (|28|l combined with the characteristic component 
size (|ll|l leads to the following characteristic path length 



(29) 



One can deduce several other scaling laws and finite-size 
scaling functions underlying the path density. For exam- 
ple, substituting the gelation time 1 — tg ^ N"^/"^ into 
the total number of paths g = {1 — t)^^ yields g ^ N-^^^. 
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V. CYCLES 

Each component has a certain number of nodes and 
hnks. The complexity of a component is defined as the 
number of hnks minus the number of nodes. Components 
with complexity —1 are trees; components with complex- 
ity and 1 are termed unicyclic and bicyclic correspond- 
ingly. Finite components are predominantly trees. We 
have seen that the overall number of links is proportional 
to N and that the overall the number of self-links is of 
the order unity. The overall numbers of trees and of uni- 
cyclic components mirror this behavior. Generally, the 
number of components of complexity R isproportional 
to (this result is well-known, see e.g. [3, 21] and es- 



pecially |29j). Therefore, it suffices to characterize trees 
and unicyclic components only. 

Each unicyclic component contains a single cycle. Cy- 
cles are an important characteristic of a graph |30ll31| . In 
this section, we analyze cycles and unicyclic components 
using the rate equation approach. We first note that cy- 
cles in random graphs were also studied using various 
other approaches: Janson [s^, HI] employs probabilistic 
and combinatorial techniques; Marinari and Monasson 
[Slf assign an Ising spin to each node and deduce cer- 
tain properties of loops from the partition function of 
the Ising model; Burda et al [3^ modify a random graph 
model to favor the creation of short cycles, and examine 
the model using a diagrammatic technique. A number of 
authors also studied cycles on information networks like 
the Internet (see 35] and references therein). 



A. Infinite System 

There is a significant difference between the distribu- 
tion of trees and unicyclic components. In the thermo- 
dynamic limit, the number of trees is extensive and as a 
result, it is a deterministic, or a self-averaging quantity. 
The number of unicyclic components is not extensive, 
but rather of the order unity; as a result it is a random 
quantity with a nontrivial distribution even for infinite 
random graphs. In what follows, we study the average 
number of unicyclic components of a given size or cycle 
length. 

The average number of cycles follows directly from the 
path length density. Quite simply, when the two extremal 
nodes in a path are linked, a cycle is born. Let the num- 
ber of cycles of size I at time t be wi [t) . It grows according 
to the rate equation 



dwi 
~dt 



(30) 



The right-hand side equals the link creation rate 1/(2A^) 
times the total number of paths Nqi^i; indeed, the total 
number of cycles of a given length is of the order one. 
The cycle length distribution is 



Wl 



21' 



(31) 



In particular, at the gelation point, the cycle length dis- 
tribution is inversely proportional to the cycle length 



= (2/)- 



(32) 



This result can alternatively be obtained using combina- 
torics. 

To characterize cycles in a given component size, we 
consider the joint distribution uj^^, the average number 
of unicyclic components of size k containing a cycle of 
length / with 1 < I < k. This joint distribution evolves 
according to the linear rate equation 



dt 



= ^Pi-i,k+ ^ (iui^i) (jcj) ~ kui^k (33) 



i+j—k 



for Z > 1. Initially there are no cycles, and therefore 
ui,k{0) = 0. Eliminating the time dependence via the 
substitution ui,k — Ui^kt^e~^^ , the coefficients satisfy the 
recursion 

k Ui,k = I Pi-i,k + (jQ)- (34) 

i+j—k 

Using the generating function Ui{z) = J2k^''^^i,k 
this recursion is recast into the differential equation 
(1 - G) ^ = i Pi-i. Dividing by we obtain 



dG 2 



(35) 



Integrating this equation yields the generating function 



(36) 



Consequently, the cycle length distribution (in finite com- 
ponents only) is = ^ , in agreement with 1)31(1 prior to 
the gelation time {t < 1). 

Additionally, the joint generating function defined as 



u{z, w) — J2i k is given by 



■u{z,iij) = - In 



2 l-wG{z + \nt~t)' 



(37) 



As for paths, statistics of cycles are directly coupled to 
statistics of components via the generating function G{z). 
The total number of unicyclic components of finite-size 
/i = fe ^^fe is therefore 



h{t)^\\n- ^ 



2 1 -r 



(38) 



Below the gelation point, h{t) = i In for i < 1. The 
total number of unicyclic components can alternatively 
be obtained by noting that (i) it satisfies the rate equa- 
tion dh/dt = ^J2kk^^k = 5-^2, and (ii) the second 



moment of the size distribution is M2 = (1 
t < 1 as follows from {Tj). 



t)-i for 
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The coefficients underlying the cycle distribution 
are found using contour integration. Indeed, writing 
Ui.k = (27ri)~^^ Uiy^^^^ dy and substituting r = Z in 
(lAljl gives Uuk = \ Kh-iv The cycle length-size dis- 



2 (k-l)\ 

tribution is therefore 



1 k 



k-l-l 



2 {k^iy. 



(39) 



The smallest cycle, ^ = 1 , is a self-connection, and the av- 
erage number of such cycles is ui^k = 5 kck- The largest 
cycles are rings, I = fc, and their total number is on av- 
erage Uk,k = jk^'' ^ 

The large-fc behavior of the cycle length distribution is 
found following the same steps leading to (|25(l 



(40) 



This distribution is self-similar in the vicinity of the gela- 
tion transition, u/,fc(i) (1 - t)^^u - t)^, ^(1 - t)), 
with the scaling function 



$«(C,»y) = (87re 



3N-1/2 



exp(-r;V2e). 



(41) 



We see that the cycle length is characterized by the same 
scale as the path length, I ^ {1 — t)^^. At the gelation 
point, the distribution is 



(87rfc3)-i/2exp(_;2/2fc). 



(42) 



Fixing the component size, the typical cycle length be- 
haves as the typical path length, I ^ k^/"^. 

The size distribution of unicyclic components is found 
from the joint distribution Vk — ^I'^^i k- Using H39|) we 

get [m 



Vk{t) 




(43) 



This distribution can alternatively be derived from the 
linear rate equation 



dvk 
~dt 



1. 



E 

i+j=k 



k'^Ck + {ivi)ijcj) ~kvk- 



(44) 



This equation is obtained from H33() using the equality 
fc^Cfc = pi^k- It reflects that linking a pair of nodes in a 
component generates a unicyclic component. Integrating 
(|42|l over the cycle length, the critical size distribution of 
unicyclic components has an algebraic tail 



{Ak)-\ 



(45) 



B. Finite Systems 



We turn now to finite systems, restricting our atten- 
tion to the gelation point. The total number of unicyclic 



components is obtained by estimating h{N,tg). Substi- 
tuting (|12|l into (|38|l shows that the average number of 
unicyclic components (and hence, cycles) grows logarith- 
mically with the system size (Fig. 0)) 



HN) 



In N. 



(46) 




10° 10' 10' 10^ 10' 10^ 10® 

N 



FIG. 4: The total number of unicyclic components versus 
the system size at the gelation point. Shown is h versus A'^. 
Each data point represents an average over lO'^ independent 
realizations. 

Comparing the path length distribution (|27|l and the 
cycle length distribution H42|l . we conclude that the char- 
acteristic cycle length and the characteristic path length 
obey the same scaling law, / ~ N^^^. This implies that 
the cycle length distribution in a finite system of size N, 
wi{N), obeys the finite-size scaling law 

Numerical simulations confirm this behavior (Fig.jS)). 

In the simulations, analysis of cycle statistics requires 
us to keep track of all links. Cycles are conveniently iden- 
tified using the standard "shaving" algorithm. Dangling 
links, i.e., links involving a single- link node are removed 
from the system sequentially. The link removal procedure 
is carried until no dangling links remain. At this stage, 
the system contains no trees. Simple cycles are those 
components with an equal number of links and nodes. 

The extremal behaviors of the finite-size scaling func- 
tion are as follows 



(2r,)-i 77 
exp(-C7y^/^) 77 



0, 



(48) 



The small- 77 behavior follows from H32|l . Statistics of ex- 
tremely large cycles can be understood by considering 
the largest possible cycles. When there are n = N/2 
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10" 



10" 



(M 



10" 



10" 




FIG. 5: Finite-size scaling of the cycle-length distribution. 
Shown is 2ri'i/w{ri) versus -q obtained using systems with size 
A'^ — 10*, 10^, and 10®. The data represents an average over 
10® independent realizations. 



links, the largest possible cycle has length I — N/2. 
likelihood w{n, 2n) is obtained using combinatorics 



Its 



w{n, 2n) = 



nl , 
X — X [2n) 
2n ^ ' 



(49) 



There are ( ways to choose the nodes participating in 
the cycle and the next term is the number of ways to ar- 
range them in a cycle. The corrective factor 2n accounts 
for rotation and reflection symmetries. The last term is 
the probability that each pair of consecutive nodes are 
linked. The large-n asymptotic behavior is 



w{n, 2n) 



1 



(50) 



Therefore, w{n,2n) ^ exp{—CN). Substituting I ~ N 
into the scaling form (|47|) leads to the super-exponential 
behavior '^wiv) ~ exp(— Cyy^/^), see Fig.El 

Typically, cycles are of size The average mo- 

ments (/(iV)) = J2i l'Wi{N)/J2iWiiN) reflect this law. 
However, the algebraic divergence, wi ^ l^^, leads to a 
logarithmic correction as follows from H46|l " (|48|l : 



(riN)) -iV"/3[lniV] 



(51) 



The behavior of the average cycle length is verified nu- 
merically (Fig. [TJ. 

Finite-size scaling of other cycle statistics such as the 
joint distribution can be constructed following the same 
procedure. For example, the size distribution of unicyclic 
components should follow the scaling form 

VkiN) - N-^/H„ . (52) 



FIG. 6: The tail of the scaling function. Shown is 2r;^u, (r;) 
versus rj^^^. 



_A 
V 



10' 



10" 



10" 




10° 10' 10' 10^ lO'' 10^ 10^ 
N 



FIG. 7: The average cycle size at the gelation point. Shown is 
{l{N))h{N) versus A^. Each data point represents an average 
over 10® independent realizations. 



VI. THE FIRST CYCLE 

The above statistical analysis of cycles characterizes 
the average behavior but not necessarily the typical one 
because the number of cycles is a fluctuating quantity. 
There are numerous interesting features concerning cy- 
cles that are not captured by the average number of cy- 
cles. For instance, what is the probability that the sys- 
tem does not contain a cycle up to time t? It suffices to 
answer this question in the pre-gel regime as the giant 
component certainly contains cycles. 

Let so{t) be the (survival) probability that the system 
does not contain a cycle at time t. The cycle production 
. The number of cycles is finite in 



rate is J = ^ 



2(1-*) • 



The scaling function diverges ^'t;(C) — (4C) for C ^ 0- pre-gel regime, since cycles are independent of each 
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0.4 



0.3 



simulation 
Poissonian 




10 



FIG. 8: The distribution of the number of cycles. Shown is 
Sn versus n at the gelation point. The system size is A'' = 10^ 
and an average over 10^ realizations has been performed. A 
Poissonian distribution with an identical average is also shown 
for reference. 



other in the N ^ oo limit. This assertion (supported by 
numerical simulations, see Fig. IHl implies that the cycle 
production process is completely random. The cycle pro- 
duction rate characterizes the survival probability sq as 
follows 



dsp 
dt 



-Jsq. 



The initial condition is so(0) = 1. As a result, the sur- 
vival probability is 



so(t) = (l-t)i/2 



(54) 



for t < 1. The survival probability vanishes beyond the 
gelation point, so{t) — for t > 1. This reiterates that in 
the thermodynamic limit, a cycle is certain to form prior 
to the gelation transition |^. 

Since the number of cycles produced is of the order of 
one in the pre-gel regime, one may expect that statistical 
properties of cycles strongly depend on their generation 
number or alternatively on their creation time. This is 
manifested by the first cycle. The quantity dt sq is 
the probability that (i) the system contains no cycles at 
time t, (ii) a cycles is produced during the time interval 
{t, t + dt), and (iii) its length is Summing these proba- 
bilities gives the probability that the first cycle produced 
sometimes during the pre-gel regime has length I: 



fi 



dt sq 



dun 
~dt 



dt{l-ty^H^-\ (55) 



Summing these quantities, we verify the normalization 

Y.fi = U'dt{i-tr^/' = i. 

1^-1 'JO 



The length distribution of the first cycle can be expressed 
in terms of the beta function fi = \ B{3/2, 1) or alterna- 
tively 



fi 



4 T{l + 3/2)' 



(56) 



The probability distribution /; has an algebraic tail, 

fi^Cl-^'\ (57) 

with C — for Z 3> 1. The tail exponent characterizing 
the distribution of the first cycle is larger compared with 
the exponent characterizing all cycles, reflecting the fact 
that the first cycle is created earlier. 

Similarly, one can obtain additional properties of the 
first cycle. We mention the probability Fk that the first 
unicyclic component has size fc, 



1 



1 



Fk ^ dt Sq - k Ck ^ - — Ik 



2 kl 



(58) 



-^e-''-*. This in- 



with the integral h = dt{l - tf/'^t''- 
tegral can be expressed in terms of the confluent hy- 
pergeometric function. Its asymptotic behavior can 
be readily found by noting that the integrand has a 
sharp maximum in the region 1 — t ^ k^^^^ leading 
to Ik ~ 2-i/-*r(3/4)fc-3/4e-'=. Using this in conjunction 
with the Stirling's formula, the size distribution has the 
algebraic tail 



Fk ~ CA:"^/4 



(59) 



(53) with C = 2-^/V-i/2r(3/4) for fc > 1. 



Under the assumption that cycle production is com- 
pletely random, the number of cycles obeys Poisson 
statistics. The probability that there are n cycles, 
Sn, then satisfies the straightforward generalization of 
Eq. (|5SJ), viz. ^ = J[s„_i — Sn] with the initial condi- 
tion s„(0) = 5n,o- The solution is the Poisson distribu- 
tion Sn — ■^e"'', see Fig.|Sl Explicitly, the distribution 
reads 



(1-0^/2 



2 l~t 



(60) 
,{t) is 



The cumulative distribution 5„ (<) = so{t) + . . . 
plotted in Fig. 

The Poisson distribution H60I) can also be used to calcu- 
late fn,i the size distribution of the nth cycle. We merely 
quote the large-Z tail behavior 



fn 



[n-iy. 



1-3/2 



■ h-ll 



n-1 



(61) 



Indeed, summation over the cycle generation reproduces 
the overall cycle distribution (|32|l . 

In finite systems, it is possible that no cycle are cre- 
ated at the gelation time. This probability decreases al- 
gebraically with the system size, as seen by substituting 
(P|l into ^ 



So 



N- 



-1/6 



(62) 
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FIG. 9: The cumulative distribution S„{t) = '}2o<j<n ^iO-) 
versus t for n = 0, 1, 2, 3. 



This prediction agrees with simulations, see Fig. ^| In 
practice, this slow decay indicates that a relatively large 
system may contain no cycles after N/2 links are placed. 
Generally, the probability that there is a finite number 
of cycles increases with the number of cycles 



6 



■IniV 



(63) 



The length distribution of the first cycle is character- 
ized by the same I ^ N^^"^ size scale as does the overall 
cycle distribution. We focus on the behavior of the mo- 
ments 



(64) 



This behavior is obtained from the distribution H57|) that 
should be integrated up to the appropriate cutoff, i.e., 

(P) jf dll^l~^/'^. As a result, the average size 
of the first cycle is much smaller than the characteristic 
cycle size (/) ~ N^^^. Moments corresponding to the size 
of the first unicyclic component grow as follows 



(A:") - Ar2"/3-i/6^ 



(65) 



as obtained from H59|l . Consequently, the average size of 
the first unicyclic component is smaller than the charac- 
teristic component size, (k) ^ N^^^. 



VII. CONCLUSIONS 

In summary, we have extended the kinetic theory de- 
scription of random graphs to structures such as paths 
and cycles. Modeling the linking process dynamically 
leads to an aggregation process for both components and 
paths. The density of paths in finite components is cou- 
pled to the component size distribution via nonlinear rate 



FIG. 10: The survival probability versus the system size. 
Shown is so{N) versus A'' at the gelation point, i.e., when 
N/2 links are placed. Each data point represents an average 
over 10^ realizations. 



equations while the average number of cycles is coupled 
to the path density via linear rate equations. Both path 
and cycle length distributions arc coupled to the compo- 
nent size distribution. 

Generally, size distributions decay exponentially away 
from the gelation point, but at the gelation time, alge- 
braic tails emerge. As the system approaches this critical 
point, the size distributions follow a self-similar behavior 
characterized by diverging size scales. 

The kinetic theory approach is well-suited for treat- 
ing infinite systems. The complementary behavior for fi- 
nite systems can be obtained from heuristic scaling argu- 
ments. This approach yields scaling laws for the typical 
component size, path length, and cycle length at the gela- 
tion point. These scaling laws can be formalized using 
finite-size scaling forms, i.e., self-similarity as a function 
of the system size, rather than time. Obtaining the exact 
form of these scaling functions is a nice challenge in par- 
ticular for the most fundamental quantity, the component 
size distribution that is characterized by a non-monotonic 
scaling function. 

The kinetic theory approach seems artificial at first 
sight. Indeed, graphs are discrete in nature and there- 
fore combinatorial approaches appear more natural. Yet, 
once the rate equations are formulated, the analysis is 
straightforward. Utilizing the continuous time variable 
allows us to employ powerful analysis tools. Moreover, 
some of the kinetic theory results are less cumbersome 
compared with the combinatorial results. 

The same methodology can be expanded to analyze 
other features of random graphs. For example, correla- 
tions between the node degree and the cluster size can 
be analyzed using bi-aggregation rate equations It 
is quite possible that structural properties in other ag- 
gregation processes, for example, polymerization with a 
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sum kernel |l7l | , and in other variants of random graphs 
such as small-world networks [37l| can be analyzed using 
kinetic theory. 

One could try to utilize kinetic theory to probe the dis- 
tribution of various families of subgraphs. We have lim- 
ited ourselves to cycles since they, alongside with trees, 
do appear in random graphs while more interconnected 
families of subgraphs are very rear J5J . Yet in biological 
and technological networks certain interconnected fam- 
ilies of subgraphs do appear. Such populated families 
of subgraphs, motifs, are believed to carry information 
processing functions [s^ [s^. It will be interesting to 
use kinetic theory to analyze motifs in special random 
graphs. 

This research was supported by the DOE (W-7405- 
ENG-36). 



-G _ 



the 



tions A{z) = G''(z) with G{z) satisfying Ge 
coefficients A/^ can be obtained via contour integration 
in the complex y plane where y = as follows 



1 
1 

1 

27ri 



dy 



.k+l 



— i dGG'' 



p(fc+l)G 



dy 



G'=+i dG 



p(fc+r)G 



-G 



27ri 



G 



n+r+1 — /c\ 



(fc-r) 



(Al) 



APPENDIX A: CONTOUR INTEGRATION 



Let A{z) = Efc 
the coefficients Ak 



A^e^^- be the generating function of 
. For the family of generating func- 



Since Ge = , it is convenient to perform the integra- 
tion in the complex G plane. In writing the third line, 
(l-G)e-G. 



we used ^ 
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